26 research outputs found
Estimating Photometric Redshifts of Quasars via K-nearest Neighbor Approach Based on Large Survey Databases
We apply one of lazy learning methods named k-nearest neighbor algorithm
(kNN) to estimate the photometric redshifts of quasars, based on various
datasets from the Sloan Digital Sky Survey (SDSS), UKIRT Infrared Deep Sky
Survey (UKIDSS) and Wide-field Infrared Survey Explorer (WISE) (the SDSS
sample, the SDSS-UKIDSS sample, the SDSS-WISE sample and the SDSS-UKIDSS-WISE
sample). The influence of the k value and different input patterns on the
performance of kNN is discussed. kNN arrives at the best performance when k is
different with a special input pattern for a special dataset. The best result
belongs to the SDSS-UKIDSS-WISE sample. The experimental results show that
generally the more information from more bands, the better performance of
photometric redshift estimation with kNN. The results also demonstrate that kNN
using multiband data can effectively solve the catastrophic failure of
photometric redshift estimation, which is met by many machine learning methods.
By comparing the performance of various methods for photometric redshift
estimation of quasars, kNN based on KD-Tree shows its superiority with the best
accuracy for our case.Comment: 28 pages, 4 figures, 3 tables, accepted for publication in A
Memory-Gated Recurrent Networks
The essence of multivariate sequential learning is all about how to extract
dependencies in data. These data sets, such as hourly medical records in
intensive care units and multi-frequency phonetic time series, often time
exhibit not only strong serial dependencies in the individual components (the
"marginal" memory) but also non-negligible memories in the cross-sectional
dependencies (the "joint" memory). Because of the multivariate complexity in
the evolution of the joint distribution that underlies the data generating
process, we take a data-driven approach and construct a novel recurrent network
architecture, termed Memory-Gated Recurrent Networks (mGRN), with gates
explicitly regulating two distinct types of memories: the marginal memory and
the joint memory. Through a combination of comprehensive simulation studies and
empirical experiments on a range of public datasets, we show that our proposed
mGRN architecture consistently outperforms state-of-the-art architectures
targeting multivariate time series.Comment: This paper was accepted and will be published in the Thirty-Fifth
AAAI Conference on Artificial Intelligence (AAAI-21
The Causal Learning of Retail Delinquency
This paper focuses on the expected difference in borrower's repayment when
there is a change in the lender's credit decisions. Classical estimators
overlook the confounding effects and hence the estimation error can be
magnificent. As such, we propose another approach to construct the estimators
such that the error can be greatly reduced. The proposed estimators are shown
to be unbiased, consistent, and robust through a combination of theoretical
analysis and numerical testing. Moreover, we compare the power of estimating
the causal quantities between the classical estimators and the proposed
estimators. The comparison is tested across a wide range of models, including
linear regression models, tree-based models, and neural network-based models,
under different simulated datasets that exhibit different levels of causality,
different degrees of nonlinearity, and different distributional properties.
Most importantly, we apply our approaches to a large observational dataset
provided by a global technology firm that operates in both the e-commerce and
the lending business. We find that the relative reduction of estimation error
is strikingly substantial if the causal effects are accounted for correctly.Comment: This paper was accepted and will be published in the Thirty-Fifth
AAAI Conference on Artificial Intelligence (AAAI-21
SDSS quasars in the WISE preliminary data release and quasar candidate selection with optical/infrared colors
We present a catalog of 37,842 quasars in the SDSS Data Release 7, which have
counterparts within 6" in the WISE Preliminary Data Release. The overall WISE
detection rate of the SDSS quasars is 86.7%, and it decreases to less than
50.0% when the quasar magnitude is fainter than . We derive the median
color-redshift relations based on this SDSS-WISE quasar sample and apply them
to estimate the photometric redshifts of the SDSS-WISE quasars. We find that by
adding the WISE W1- and W2-band data to the SDSS photometry we can increase the
photometric redshift reliability, defined as the percentage of sources with the
photometric and spectroscopic redshift difference less than 0.2, from 70.3% to
77.2%. We also obtain the samples of WISE-detected normal and late-type stars
with SDSS spectroscopy, and present a criterion in the versus
color-color diagram, , to separate quasars from stars.
With this criterion we can recover 98.6% of 3089 radio-detected SDSS-WISE
quasars with redshifts less than four and overcome the difficulty in selecting
quasars with redshifts between 2.2 and 3 from SDSS photometric data alone. We
also suggest another criterion involving the WISE color only, , to
efficiently separate quasars with redshifts less than 3.2 from stars. In
addition, we compile a catalog of 5614 SDSS quasars detected by both WISE and
UKIDSS surveys and present their color-redshift relations in the optical and
infrared bands. By using the SDSS , UKIDSS YJHK and WISE W1- and W2-band
photometric data, we can efficiently select quasar candidates and increase the
photometric redshift reliability up to 87.0%. We discuss the implications of
our results on the future quasar surveys. An updated SDSS-WISE quasar catalog
consisting of 101,853 quasars with the recently released WISE all-sky data is
also provided.Comment: 27 pages, 9 figures and 5 tables. Revised to match the published
version in the Astronomical Journal. 5 tables are available electronically at
(http://vega.bac.pku.edu.cn/~wuxb/sdsswiseqso.htm). A new SDSS-WISE quasar
catalog consisting of 101,853 quasars with the WISE all-sky data is available
as Table
Selecting Quasar Candidates by a SVM Classification System
We develop and demonstrate a classification system constituted by several
Support Vector Machines (SVM) classifiers, which can be applied to select
quasar candidates from large sky survey projects, such as SDSS, UKIDSS, GALEX.
How to construct this SVM classification system is presented in detail. When
the SVM classification system works on the test set to predict quasar
candidates, it acquires the efficiency of 93.21% and the completeness of
97.49%. In order to further prove the reliability and feasibility of this
system, two chunks are randomly chosen to compare its performance with that of
the XDQSO method used for SDSS-III's BOSS. The experimental results show that
the high faction of overlap exists between the quasar candidates selected by
this system and those extracted by the XDQSO technique in the dereddened i-band
magnitude range between 17.75 and 22.45, especially in the interval of
dereddened i-band magnitude < 20.0. In the two test areas, 57.38% and 87.15% of
the quasar candidates predicted by the system are also targeted by the XDQSO
method. Similarly, the prediction of subcategories of quasars according to
redshift achieves a high level of overlap with these two approaches. Depending
on the effectiveness of this system, the SVM classification system can be used
to create the input catalog of quasars for the GuoShouJing Telescope (LAMOST)
or other spectroscopic sky survey projects. In order to get higher confidence
of quasar candidates, cross-result from the candidates selected by this SVM
system with that by XDQSO method is applicable.Comment: 11 pages, 4 figures and 7 tables, MNRAS accepte
Screening Genes Promoting Exit from Naive Pluripotency Based on Genome-Scale CRISPR-Cas9 Knockout
Two of the main problems of stem cell and regenerative medicine are the exit of pluripotency and differentiation to functional cells or tissues. The answer to these two problems holds great value in the clinical translation of stem cell as well as regenerative medicine research. Although piling researches have revealed the truth about pluripotency maintenance, the mechanisms underlying pluripotent cell self-renewal, proliferation, and differentiation into specific cell lineages or tissues are yet to be defined. To this end, we took full advantage of a novel technology, namely, the genome-scale CRISPR-Cas9 knockout (GeCKO). As an effective way of introducing targeted loss-of-function mutations at specific sites in the genome, GeCKO is able to screen in an unbiased manner for key genes that promote exit from pluripotency in mouse embryonic stem cells (mESCs) for the first time. In this study, we successfully established a model based on GeCKO to screen the key genes in pluripotency withdrawal. Our strategies included lentiviral package and infection technology, lenti-Cas9 gene knockout technology, shRNA gene knockdown technology, next-generation sequencing, model-based analysis of genome-scale CRISPR-Cas9 knockout (MAGeCK analysis), GO analysis, and other methods. Our findings provide a novel approach for large-scale screening of genes involved in pluripotency exit and offer an entry point for cell fate regulation research